We introduce a dataset for facilitating audio-visual analysis of musicalperformances. The dataset comprises a number of simple multi-instrumentclassical music pieces assembled from coordinated but separately recordedperformances of individual tracks. For each piece, we provide the musical scorein MIDI format, the audio recordings of the individual tracks, the audio andvideo recording of the assembled mixture, and ground-truth annotation filesincluding frame-level and note-level transcriptions. We describe ourmethodology for the creation of this dataset, particularly highlighting ourapproaches for addressing the challenges involved in maintainingsynchronization and naturalness. We compare this dataset with existing widelyused music audio datasets on the synchronization quality and show its highquality. We anticipate that the dataset will be useful for the development andevaluation of many existing music information retrieval (MIR) tasks, as well asmany novel multi-modal tasks. On this end, we benchmark this dataset withexisting music audio datasets using two existing MIR tasks (multi-pitchanalysis and score-informed source separation). We also define two novelmulti-modal MIR tasks (visually informed multi-pitch analysis and polyphonicvibrato analysis), provide evaluation measures and baseline systems for futurecomparisons. Finally, we propose several emerging research directions that canbe supported by this dataset.
展开▼